VA-Files vs. R*-Trees in Distance Join Queries

نویسندگان

  • Antonio Corral
  • Alejandro D'Ermiliis
  • Yannis Manolopoulos
  • Michael Vassilakopoulos
چکیده

In modern database applications the similarity of complex objects is examined by performing distance-based queries (e.g. nearest neighbour search) on data of high dimensionality. Most multidimensional indexing methods have failed to efficiently support these queries in arbitrary high-dimensional datasets (due to the dimensionality curse). Similarity join queries and K closest pairs queries are the most representative distance join queries, where two highdimensional datasets are combined. These queries are very expensive in terms of response time and I/O activity in case of high-dimensional spaces. On the other hand, the filtering-based approach, as applied by the VA-file, has turned out to be a very promising alternative for nearest neighbour search. In general, the filtering-based approach represents vectors as compact approximations, whereas by first scanning these approximations, only a small fraction of the real vectors is visited. Here, we elaborate on VA-files and develop VA-file based algorithms for answering similarity join and K closest pairs queries on highdimensional data. Also, performance-wise we compare the use of VA-files and R*-trees (a structure that has been proven to be of robust nature) for answering these queries. The results of the comparison do not lead to a clear winner.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive and Incremental Processing for Distance Join Queries

A spatial distance join is a relatively new type of operation introduced for spatial and multimedia database applications. Additional requirements for ranking and stopping cardinality are often combined with the spatial distance join in on-line query processing or internet search environments. These requirements pose new challenges as well as opportunities for more efficient processing of spati...

متن کامل

Cost models for distance joins queries using R-trees

The K-Closest-Pairs Query (K-CPQ), a type of distance join in spatial databases, discovers the K pairs of objects formed from two different datasets with the K smallest distances. Recently, branch-and-bound algorithms based on R-trees have been developed in order to answer K-CPQs efficiently. For query optimization purposes, analytical models are needed to estimate the processing cost of a spec...

متن کامل

Similarity Search on Bregman Divergence: Towards Non-Metric Indexing

In this paper, we examine the problem of indexing over non-metric distance functions. In particular, we focus on a general class of distance functions, namely Bregman Divergence [6], to support nearest neighbor and range queries. Distance functions such as KL-divergence and Itakura-Saito distance, are special cases of Bregman divergence, with wide applications in statistics, speech recognition ...

متن کامل

Optimization of spatial join using constraints based- clustering techniques

Spatial joins are used to combine the spatial objects. The efficient processing depends upon the spatial queries. The execution time and input/output (I/O) time of spatial queries are crucial, because the spatial objects are very large and have several relations. In this article, we use several techniques to improve the efficiency of the spatial join; 1. We use R*-trees for spatial queries sinc...

متن کامل

Of Snowstorms and Bushy Trees

Many workloads for analytical processing in commercial RDBMSs are dominated by snowstorm queries, which are characterized by references to multiple large fact tables and their associated smaller dimension tables. This paper describes a technique for bushy join tree optimization for snowstorm queries in Oracle database system. This technique generates bushy join trees containing subtrees that pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005